Computer software may include automated processes you can use for fitting models. We

discourage you from using these in biostatistics because you want to have a lot of control over

how a model is being fitted to make it possible for you to interpret the results. However, these

processes can be used to create comparison models — or to simulate improved models — which

are perfectly reasonable methods to explore ways to improve your model.

Understanding Interaction (Effect Modification)

In Chapter 17, we touch on the topic of interaction (also known as effect modification). This is where

the relationship between an exposure and an outcome is strongly dependent upon the status of another

covariate. Imagine that you conducted a study of laborers who had been exposed to asbestos at work,

and you found that being exposed to asbestos at work was associated with three times the odds of

getting lung cancer compared to not being exposed. In another study, you found that individuals who

smoked cigarettes had twice the odds of getting lung cancer compared to those who did not smoke.

Knowing this, what would you predict are the odds of getting lung cancer for asbestos-exposed

workers who also smoke cigarettes, compared to workers who aren’t exposed to asbestos and do not

smoke cigarettes? Do you think it would be additive — meaning three times for asbestos plus two

times for smoking equals five times the odds? Or do you think it would be multiplicative — meaning

three times two equals six times the odds?

Although this is just an example, it turns out that in real life, the effect of being exposed to both

asbestos and cigarette smoking represents a greater than multiplicative synergistic interaction (meaning

much greater than six) in terms of the odds for getting lung cancer. In other words, the risk of getting

lung cancer for cigarette smokers is dependent upon their asbestos-exposure status, and the risk of lung

cancer for asbestos workers is dependent upon their cigarette-smoking status. Because the factors

work together to increase the risk, this is a synergistic interaction (with the opposite being an

antagonistic interaction).

How and when do you model an interaction in regression? Typically, you first fit your final model

using a multivariate regression approach (see the earlier section “Adjusting for confounders in

regression” for more on this). Next, once the final model is fit, you try to interact the exposure

covariate or covariates with a confounder that you believe is the other part of the interaction. After

that, you look at the p value on the interaction term and decide whether or not to keep the interaction.

Imagine making a model for the study of asbestos workers, cigarette smoking, and lung cancer. The

variable asbestos is coded 1 for workers exposed to asbestos and 0 for workers not exposed to

asbestos, and the variable smoker is coded 1 for cigarette smokers and 0 for nonsmokers. The final

model would already have asbestos and smoker in it, so the interaction model would add the

additional covariate asbestos × smoker, which is called the higher order interaction term. For

individuals who have a 0 for either asbestos or smoker or both, this term falls out of their individual

predicted probability (because 1 × 0 = 0, and 0 × 0 = 0). Therefore, if this term is statistically

significant, then individuals who qualify to include this term in their individual predicted probability

have a statistically significantly greater risk of the outcome, and the interaction term should be kept in

the model.